tidyverse?The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures (tidyverse.org).
Its primary goal is to facilitate a conversation between a human and a computer about data (Wickham et al., 2019).
tidyverse core packagesreadr: data importtibble: modern data frame objectstringr: working with stringsforcats: working with factorstidyr: data tidyingdplyr: data manipulationggplot2: data visualizationpurrr: functional programmingTidy data sets are all alike; but every messy data set is messy in its own way (Wickham/Grolemund, 2017]
Tidy Data Principles:
The concept of tidy data has been coined by Hadley Wickham in his 2014 paper, Tidy Data.
The concept formulates principles for structuring rectangular, tabular data sets consisting of rows and columns:
Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.
palmerpenguinsTo learn about the tidyverse, we will use data from the palmerpenguins package by Allison Horst.
The package comes with data about penguins observed on islands in the Palmer Archipelago near Palmer Station, Antarctica.
readr: Read Rectangular Text Datareadr provides read and write functions for multiple different file formats:
read_delim(): general delimited filesread_csv(): comma separated filesread_csv2(): semicolon separated filesread_tsv(): tab separated filesConveniently, the write_*() functions are analogous to the read_*() functions:
write_delim(): general delimited fileswrite_csv(): comma separated fileswrite_csv2(): semicolon separated fileswrite_tsv(): tab separated filesIn addition, you can use the following packages to read data in other file formats:
readxl: Excel fileshaven: SPSS & STATA filesgooglesheets4: Google Sheetsrvest: HTML filesdplyr: A Grammar of Data Manipulationdplyr provides a set of functions for manipulating data frame objects while relying on a consistent grammar. Functions are intuitively represented by “verbs” that reflect the underlying operations.
Today, we will use the following functions from dplyr:
Operations on rows:
filter() picks rows that meet one or several logical criteriaOperations on columns:
select() picks respectively drops certain columnsrename() changes the column namesmutate() transforms the column values and/or creates new columnsOperations on grouped data:
group_by() partitions data based on one or several columnssummarize() reduces a group of data into a single rowmagrittr: The Forward-Pipe Operatormagrittr comes with a set of operators, of which we will only use one:
%>%Essentially, the pipe operator aims to improve the readability of your code in multiple ways:
tidyr: Tidy Messy Datatidyr provides several functions that help you bring your data into the tidy data format (e.g., reshaping data, splitting columns, handling missing values or nesting data).
Today, we will use the following functions from dplyr:
pivot_longer(): “lengthens” data, increasing the number of rows and decreasing the number of columns.pivot_wider(): “widens” data, increasing the number of columns and decreasing the number of rows.ggplot2: Elegant Data Visualisationsggplot2 is Hadley Wickham’s reimplementation of the 2005 published The Grammar of Graphics by Leland Wilkinson. It provides a large amount of functions for generating high-quality graphs in a layer-based fashion and has even sparked a whole ecosystem of ‘gg’-style visualization packages.
ggplot2: Elegant Data Visualisationsggplot2: Elegant Data VisualisationsLet’s check out the ggplot flipbookhttps://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html